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PREFACE 


The purpose of this project was to devise a quick and effec 
tive technique for analysing language teaching methods and mate= 


rials in use in Canada. 


Since the analysis was to be as objective and as complete as — 
possible all measurable variables had first to be quantified. This 
quantification necessitated the analysis of a volume of date too great 
to be processed by hand3 it was therefore evident that the work 


would have to be done with aid of a computer. 


Because the same process of analysis would have to be used for 
each method, the same series of operations being repeated again and 
again, computer analysis of methods = mechanolinguistic method ana=- 
lysis -- was obviously the most appropriate and economical way of 
obtaining a quick, objective and complete analysis of such a large 


amount of language teaching material. 


Fortunately, much work toward a quantitative analysis of methods 
had already been done; it remained to translate this into computer pro- 
grams and to produce a prototype of mechanolinguistic analysis. This 
was the expressed objective of the research project. It was felt, 
however, that an isolated prototype, no matter how perfect, would not 
give a sufficiently clear idea of the sort of comparisons made possi~ — 
ble by the analysis. It was therefore decided, at the risk of delaying 


the final report, to put a second method through the computer analysis 


Digitized by the Internet Archive 
in 2024 with funding from 
University of Toronto 


https://archive.org/details/31/761120610225 


Lie 


already developed for the first and to present both results side by 


Side. 


For the first method analysed by computer, we selected the French 
course moSt widely used in the Canadian Civil Service, vize, Voix et 
Images de France (Method a); for the second, we chose a comparable 
audio-visual course which had been used in certain government depart~ 


ments, vizes French through Pictures (Method b). 


Before any of this material could be put into the computer, three 
types of computer programs had first to be elaborated=data control 


programs, language analysis programs and method analysis programse 


The data control programs were based on available language sta~ 
tistics for spoken French including those of the Centre de recherches 
our la diffusion du frangais, which supplied some yet~unpublished fre= 
quency figures. This material was included in the data control pro~ 
grams » the purpose of which was to Supply a category and numerical 


value for each item in the language likely to be found in the method. 


The language analysis programs were based on the most formal pro= 
cedures of analysis available for French. They included an automatic 


grammar which had to be specially devised for the purpose. 


The method analysis programs were elaborated from parameters esta~ 
blished in an earlier study (1961) published in London in 1965 (Lan- 


guage Teaching Analysise London: Longmans). 


q 
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Completion of these three types of mechanolinguistic programs was 
complex and time-consuming. Much time had to be spent in the prepara= 
tion of such things as frequency dictionaries, structural grammars, 
and data searching sequences, in the perfection of component programs 
by various stages of approximation, the redesigning of procedures 
within the limited memory capacity of the computer, etc. When this 
complex of programs had been completed, each of its stages had to be 
verified, and ofter re-written, until all stages of analysis were suf 
ficiently perfect and well-co~ordinated to produce an accurate prototype 
analysis of a single method. When this was achieved, it was a relati~ 
vely simple matter to produce the analysis of a second method, since we 
had now reached the stage where out automatic method analysis was func= 


tioning. 


As a result, it is now possible to make a rapid, automatic and 
detailed objective analysis of all methods and materials used in Canada 


for the teaching of the second language. 


The complex of mechanolinguistic method analysis programs now 
available is the result of carefully co-ordinated teamwork. The com 
putuer programs were the work of Michael Mepham, as were the graphic 
representations of the results. Much of the work on the data control 
material was done by Lorne Laforge, Jean~Guy Savard, Jean-Marie Courtois, 
Flore Gervais; Monique Benoit, Pierre Cardinal, Gerald McNulty and Mi~ 
cheéle Crevitre. The technical assistance of Louis Robichaud, Pierre 
Ardouin and Roger Miville-Deschénes also contributed to the success of 


this venture. 
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Introduction 


The choice of language teaching methods has always been a matter 
of opinion rather than of fact. Departments of education have up to 
now Simply relied on the opinions of language teachers. These persons, 
often excellent classroom teachers with good judgment, could neverthe- 
less not be aware of all the possible methods nor did they have the 
techniques of analysis to enable them to pass an objective judgment 


on those which came to their attention. 


Every year thousands of learners are consequently introduced 
to the study of the second language through methods which are not 
the most suitablee In some provinces, the methods have first been 
"experimented" in classes where students are reported to have done 


"someone's idea of well on someone's idea of a valid test.” 


The failure of experimental learning situations in language team 
ching method analysis is now recognized as being due to the multipli- 
city of factors in the method, the teachers and the learners. Only 
after each complex of factors has been isolated, analysed and quan=- 
tified can a techniqme be involved for determining with any certain- 


ty the most suitable language learning method for any given groupe 


The most important and the most neglected in the analysis of 
these three components is the method itself. Only after all the re~ 
levant factors in a method have been analysed and quantified, is it 
possible to determine the extent to which each relates to any teaching 


or learning situation. 
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The techniques for isolating and qiantifying these factors in 
‘the analysis of methods had already been elaborated before the project 
begane They were grouped according to four #% mdanental pedagogical 
questions: 

1. What elements are taught? (selection). 

2. When are they introduced? (gradation). 

3e How are they introduced? (presentation). 


4. How are they exercised? (repetition). 


A number of the measurements devised could be automated; that 
is, to say, effected on a digital computer. We proposed to reduce 
human intervention in the execution of the analysis while producing 


a more detailed and rigourous description than could be undertaken 


manuallye 
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2e~ The Description of the Method 


Of the possible variables, only those which belong to one or more 
of the pedagogical factors are considered. Of these, some are more 
easily evaluated manually. For the purposes of the description, the 
variables which can be measured automatically are grouped under the 


factor they most effectively describe. 
2eole Selection 


The value of the method depends on the number of different elem 
ments, the nature of these elements and their utility within the lan= 


guage 
Qelele~ Quantity 


The quantity is measured by the number of elements by grammatical 


category. 
Aolece™ Proportion 


The nature of the elements is described globally by the relative 


proportion of the total number of elements within each categorye 
2olo3de™ Utility 


The usefulness of the vocabulary words selected is measured globally 


by the frequency and range of these words in free speeche 
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The values assigned are drawn from “L'Elaboration du Frangais Fonda~ 


mental" (Gougenheim, Ge, et ale, Paris, Didier, 196). 
Gradation 


The order and rate of introduction of each new element may vary 
from one method to abothere To evaluate the gradation, we measure the 


intake, density and productivity. 
Intake 


The intake is measured for a standard selection of text by the 
proportion of new elements within the section to the accumulated num~ 


ber of different elements, this for each category of elements. 
Dens ity 


The density is defined as the number of new elements per sentences 
It is measured globally for the method by the proportion of the senten= 


ces with density zero, one, two, three or more. 
Productivity 


The productivity is defined here, as the number of possible variant 
Phe teations of a structure given the selected number of constituent 
elementse The structures can be assessed individually and by category 
at different points within the method. The productivity may be measured 
as defined, or by some function of the defined variable. We use 4 


logarithmic function in order to reduce the scale of the measurements 


and to simplify the calculation. 
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Presentation 


To measure the means by which a method presents its material, it 
is necessary to be able to distinguish the different kinds of material 
within the method. Manual inclusion of coded markers, called "sigla", 
before the automatic analysis, permits us to measure certain aspects 


of the presentatione 
Introduction 


The number and proportion of the elements are counted according 
their first occurrence in three kinds of textual material: syntactic 


presentation, syntactic repetition, or non~syntactic. 
Contextualisation 


The amount and proportion of material is measured according to 
which procedure for contextualising the meaning is favourede Pictorial 
contextualisation depends on the different kinds of pictures; diffe~ 
rential, on the use of the mother tongue in explanations and trans=~ 


lations; and verbal, on the different literary forms of the texte 


Repetition 


r 
The repetition in a method depends on the number of occurences 


of each element. It may be measured for an element by the total number 
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of occurences; for a group of elements by the total and by the ave~ 
rage repetition per element. The elements are grouped by category, 


medium, skill and type of exercise. 


Category 

The total and average repetition are measured for each category 
of elements, once for all the original material of the method, and 
once again for all the original and duplicated material. 
Media 

The total word repetition is distributed according to the diffe 
rent medfa of the method: manual, reader, exercise book, magnetic 
tape » etce 
Skill 

The total word repetition is distributed according to the skill 
involved: reading, listening, speaking, writing. 
Type of exercise 


The word repetition is measured within each type of exercise: 


rote, incremental, variational, operational. 
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Considerations Basic to Data Processing 


The adoption of data processing techniques obliged us to organize 
the project according to the resources and limitations of these tech- 


niqueSe 
The Equipment 


At "Le Centre de traitement de 1'Information de l'Université La- 
val", besides the conventional accounting machinery we had an IBM-1)10 
computer at our disposal. The installation included a card=reader, six 


magnetic tape units and an IBM-1)03 printer. The memory had a capacity 


of 60,000 character positions. 
The Control of the Computer Operations 


The instructions according to which the method was manipulated 
were written in the symbolic programming language Fortran. A number 
of general operations were already available in the form of sub~pro- 
grams that could be incorporated in larger instruction sequences (pro- 


grams) prepared explicity for the project. 


In addition, a number of programs furnished with the machine were 
at our disposition. These programs, henceforth referred to as "control 
programs" were designed for efficient execution of commonly used opera- 
tions. In particular, we used control programs to put data from pun- 
ched cards onto magnetic tape, to print data from magnetic tape onto 


paper, and to sort data stored on tape. 
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The Punching of the Method onto Cards. 


All material to be furnished to the computer had to be punched 
initially onto cards. The punch operators work is that of a typists 
to copy the material as it is presented to him. For this reason, all 


of the method to be treated had to be in written form. 
The Editing of the Method 


In order to exclude unwanted material and include material not in 
written form, the method had to be revised manually before being punched 
onto cards. Unwanted material was barred, and additional material was 
written in where appropriate. For instance, in order to ensure the iden~- 
tification of pedagogically distinct portions of the text, specially co= 


ded "“sigla” were inserted in the text. 
The Programs 


The instructions controlling the computer operations on the mate~ 
rial of the method had to be formulated in advance. The instructions 
were grouped into a number of distinct programs rather than in one com= 


plete program, for the reasons enumerated: 


le One program would be unwieldy: it would be difficult to 
prepare, to test, and to modifyo 


2, One program would be inefficient; in order to use it on 
the computer with its limited storage capacity, the ope- 
rations would have to be conceived to economise on sto~ 
rage space rather than on execution time. 


3. With several programs, we can put the control programs 
already available to effective use. This saves program= 
ming and testing time for the operations thus implemented. 


he With several programs, it is possible to judge the results 
at each stage before going on to the next. Modifications 
and corrections can be incorporated into the succeeding 
Programs e 
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Each program had to control several types of activity. It had 
to enter the material of the method (input), work on this material, 
then store the results (output). The phase where the material is wor= 
ked on includes all the operations essential to the analysis of the 
methode Thus the algorithms or logical processes of the analysis are 
formulated within the programs. First however, they had to be con= 


ceived. 


Each program must contain all the information upon which its f 
Been eions are dependant, or have access to such information. ‘That is, 
the information may be contained in the instructions of the program, or 
in tables which can be consulted by the instructions. It was decided 
to include as much as possible of the needed grammatical data in ta~ 
asta gs punched ee on tapee In this way, the programs wuld be 
independant of the grammar. The latter could then be modified or even 


replaced by that of another language without necessitating changes in 


the instructions which use ite 


ay 


bebtoob saw # «ce Sousa adit wd #92 
“st st sab Isobtemnans bebeen etd ea 
ed bina ematgouq edt eyow erdd af voqat 5 
neve 10 beftibem ed meld hives rottsl alt pa lies St a 
at teyasds gattetieesocen srodtiw esergast iaiaieibe ‘to dant yd 
oot omy okey smcomane anit 


> % 


’ 


[9 
= ta 
6 


hen 


elem 


=10— 
The Preparation of the Method. 


The pedagogical material of the method was prepared and subse~ 
quently punched onto cards. Both operations were effectuated mam 


nually according to prescribed directions. 
The Pre~Hditing of the Method. 


Certain parts of the manuals were barred, being non-pertinent 


to the analysis. This includes: 


1. text not in the language being taught, 

26 grammatical and lexical lists at the back of the manuals, 
3. characters not making up words, 

he text in phonetic script, 

5. chapter and exercice numbering, 

6. wrong choices in multiple choice exercices y 


7e exercice keys at the back of the manuals. 


In order to ensure the correct pedagogical ordering of material 
found in several manuals, the numbering of the pages was modified. 
Pages of text to be inserted in the principal manual were given the 
page number of the preceding page, plus a sub~page number from 


et tO ove 


Multiple choice and fill~in-the~blanks exercices were completed 


manually. 


To ensure the identification of the pedagogical units of presen 
tation, these were identified by special "sigla". The sigla were com 


ded according to Table 1 and inserted according to the fol- 
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lowing rules: 


le 


260 
36 


Sigla at the beginning of each pedagogically homogeneous 
passage, 


one siglum for each medium of presentation of the passage, 


One Siglum for each series of pedazgogically homogeous pic 
tures. 


le2= The Punching of the Method onto Cards. 


The text was punched onto cards onamachine with a keyboard of 


47 characters. In order to accomodate the letters and punctuation 


marks encountered in the texts, the following rules were followeds 


le 


20 


36 


le 


for each alphabetic character, whether small, capital 
or accented, the unique corresponding keyboard letter, 


for each accented letter; a number immediately following 
it according to Table 2.~ (a), 


for each punctuation mark, one or several characters acu 
cording to Table 2.- (b), 


for each space, a blank. 


The text was punched onto cards according to the following rules: 


le 


Ze 


360 


Ye 


72 letters and blanks per card in the card colums 
ite ie; 


continuity from one card to the next in conformity 
with rules 3 and h, 


at least one blank space or punctuation mark between 
words, 


no blanks within a worde 
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The cards were assigned page numbers in the order of presentation 
of the materiale Within each page, the cards were numbered consecutive~ 


ly from 1 up. The page and line numbers occupied the card columns in- 


dicated below: 


columns 73, 7h, 75 page number 
column 76 inserted page number 
columns 77, 78 line number 
column 79 overflow line number. 


TABLE 1e- (a) Pictorial Sigla Codes 


Type of picture Column 
1 2 3 
with caption p 
without caption P 
for distribution b 
for display x 
in textbook 4 
in exercise book e 
in reader Pg 
wall picture h 
picture card a 
flannelgraph g 
slides i 
film strip x 
motion picture c 


number (example) 
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Table 1.- (b) Textual Sigla Codes 


Type of text Column 


manual b 
display material Ww 
recorded on tape t 
recorded on disk d 


recorded on film £ 


orthographic script o 
phonetic script Pp 
recorded, unspaced r 


recorded, spaced s 


prose p 
verse Vv 
song with music m 
dialog d 
isolated sentence 8 
isolated phrases By 
isolated words | Ww 


caption or title y 


for reading r 
for listening al 
explanation, differential x 


explanation, vernacular a 


MABLE lem (b) Textual Sigla Codes 


Type of text 


imitation, copying 
dictation, transcription 
for reading aloud 
incremental imitation 
complet ion 

multiple choice 

alteration 

paraphrasing, rephrasing 
question and answer 

oral composition 

written composition 
translation from vernacular 
translation into vernacular 
grammar 


lexical list 


syntactic presentation 
syntactic repetition 
non=syntactic 


omitted fran cards 


Column 
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TABLE 2,— Diacritic Transcription 


Text character Punch character 


(a) Accents 


¢ (cedilla) Ignored 
/ (acute) 1: 
“ (circumflex) 2 
* (grave) 3 
" (diaeresis) 4 


(o) Punctuation 


( ( 
) ) 
? $ 
j rie 
: s 
"4 « (C 
“)) )) 


** 


see 
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The Analysis 


The organization of the analysis is outlined graphically in Figure 
l. The series of programs includes the following basic steps: 
le the identification of the graphic elements: words, 
punctuation, sigla, 
2e the identification of the phrase structures, 
3. the identification of the clause and sentence structures, 


4. the counting of the elements identified according to 
their different classes, 


5. computation of the measurements dependent upon the ele- 
ments and their counts, and 


6. the presentation of the results in lists, tables, and 
graphse 


Cards to Tape 


The material of the method was transfered from punched cards 
to magnetic tape with the IBM=110 and the control program "Cards 
to Tapel.". The contents of twenty cards was grouped within each 


tape record. The storage tape was labelled as "Text", 
Program Ple Word Separation 


The words contained in the tape records of "Text" were sepa~ 


rated into individual records. Punctuation and sigla were treated as 


separate wordse 


Each record of one word was assigned the page and line number 


of the corresponding card. 


An order number was assigned to each record. This number corres= 
ponded to the consecutive integer for each word, with an increment of 


0.2 for each punctuation mark. 
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Each record was assigned a duplication number corresponding to 


the number of sigla identifying the passage of the text. 


Each record was assigned a presentation number based on the fifth 
position of the sigla code. This distinguishes text for presentation, 


text for repetition, and material not in sentence form. 


The new records containing sigla were stored consecutively on 
the tape labelled "Text Sigla"., Each record was punched onto a card 


for permanent storage. 


The records containing words and punctuation were put onto the 


tape labelled "Textual Words". 
Program P 2: Word Identification 


The records of "Textual Words" were sorted in alphabetical or= 
der with the control program "Sort and Merge™. The resulting tape 


was called “Alphabetic Words”, 


The words and punctuation were identified by comparaison with 


the prepared lists "Vocabulary Words" and "Functional Words*®. 


The information contained in the input records was transferred 


to the output records. 


In addition, certain data contained in the lists was transfer 
red to the identified elements. This includes a grammatical category 
number, a word identity number, and, for certain words with multiple 


functions, an ambiguity number. 


For vocabulary words with regular grammatical endings, a new re= 


cord containing the ending was produced. The ending was identified 
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in the list, "Grammatical Bndings" and assigned the pertinent data 

as for the words. For each different vocabulary word the correspon 
ding utility parameters of the "Vocabulary List" were accumulated by 
word categorye The totals were assigned to a record with fictive iden= 


tity and order numbers to distinguish them from the words. 


The records containing words, punctuation, endings and utility 


counts were put onto the tape "Alphabetic Identities*®. 


The first occurences of unidentified elements were punched onto 


the cards "Unidentified Words" with their input information. 


A list of each different type of input element with the output 
data of its first occurrence and a count of the number of its occur= 


rences was printed under the title "Identified Words". 
Correction of Word Identities 


The words in the card deck "Unidentified Words" were identified 
manually; that is to say, they were assigned numbers analogous to 
those of the identified wordse The identifying numbers were punched 


onto the cards now called "Word Corrections". 


The list "Identified Words" was revised manually to ensure their 
identification and their spellinge All modifications were punched onto 


cards and included in the deck "Word Corrections". 


wi chty bn synthe. aot teutomue boro Rin st 
“eat dtonebl cciaiaaall | 


oteo bedonuq erew etasmete boltidmebiow a. eel ext? it 
sno ksenrso% tt duqnkt tkedd ddiw Yobao batt saohét" cbse 8 


ih ; ay 


tuqtvo edd dtiw tnemefe tnqmi to equ sae tb _" sett A sian 
“wos0 edi to tedminm ait to tawoo & bas eonermooo deat edt to . 
, 4 


“abvoW battisnebl" efdtd elt hablhe bedi eaw seomeT 


7 


eetdtinebl broW to alte 


beltkdnebt sew “ebroW beltitnebtal™ doeb buo edd ak ebrow adT | 
ot epopolens aredmya bengiens evew Yedst eyse od al stadt reitewanm 
herons etew eredtun gityiiteebt edT esbtow batt idmebs eid to eeodd 
e"enotdserre) brow" béttiso won abso ed¢ baa 


a 


stedt ewrede o¢ yiisunem beeivet ssw tabyoW bettitmebt™ sakl ad? 

) 9 é : 

odne berloarg evew aooidsotttbom [fA .yntifeqe sleit bas motgsofitgnwebl — 
“gapisosrxoD bioW" xoeb add at bebytoat bas ebiso 


oe 
5e5e Program P 3: Word Correction 


The punctuation, words, and grammatical endings of the card deck 
"Word Corrections" were compared sequentially with those contained on 
the tape "Alphabetic Identities". The corresponding items were modi- 
fied, with extra items being added as indicated on the cards. The out- 


put tape was called "Correct Identities." 
5.6.~ Program P 3.52 Concordance Counts 


The records of "Correct Identities" were sorted by the control — 
program “Sort and Merge" according to the hierarchy: category number, 
alphabetic order, textual order number. From the resulting concor= 
dance, on the tape labelled "Concorded Identities" was selected the 
first record for each different type of punctuation, graphic word, 
and grammatical ending. The number of tokens of each type of ele~ 
ment was counted to be included in the output record on the tape 


"Identity Types". 


A list, printed directly from the tape, included the following 
information for each item: category number, identity number, page and 
line number of the first occurrence, order number, and token count. 
The latter gives the amount of repetition for each typee The list 
kept the same order as the tape, that is, first by category, then 


by alphabetic order. 
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Program P h: Phrase Identification 


The records of "Correct Identities" were sorted in textual order 
onto the tape "Textual Identities". Next, in one pass over the iden=- 
tified tokens in their textual order, we resolved the functional am= 
biguities of word identities, counted the types and tokens of the words 
and grammatical endings, identified the phrase structures, and prepared 


the sequence of phrases found in each sentence. 


The functional ambiguities of certain problem words were specified 
by the ambiguity number at the moment the word was identified (P 2). 
This number refers us to a sequence of rules contained in the card 
deck "Ambiguity Rules". The rules give the dependence of the cate= 
gory and identity numbers on the neighbouring elements, Both numbers 


are modified as indicated by the rules. 


The number of types and tokens are counted by category of elements 
for each block of 500 textual words at each blocking point. The counts 


were put onto a record of the tape "Identity Counts". 


The distribution of the types of element in the whole method was 


counted according to their introduction. That is, the number of first 


occurrences of vocabulary words, functional words, and grammatical endings 


was counted according to the presentation number assigned at word 
separation (P 1). The comts were included in the final record of 


"Tdentity Counts*. 


The utility counts accumulated during the word identification (P 2) 


were transferred to the final record of "Identity Counts”. 
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The phrases were identified by comparing maximal sequences of 
word and punctuation categories in textual order with the phrase struc 
tures stored on the cards "Phrase Dictionary". The identified phrases 
were assigned category and identity numbers, and the page, line, dupli- 
cation numbers of the terminal punctuation of the sentence. The order 
number of the terminal punctuation was given the increment 0.3 for 
the output record of each phrase. The output tape was called "Phrase 


Structures." 


The category numbers of the identified phrases were put into sequen- 
ce for each sentence. Each phrase sequence was assigned the same infor= 
mation as the phrases making it up, less the category and identity num 


bere The output records were put onto the tape "Textual Sequences", 


For each sentence we counted the number of new types of vocabulary 
word, functional word, and grammatical ending. The three counts were 


included in the records of "Textual Sequences." 
Program P 53: Clause Identification 


The phrase sequences were broken down into clause structures which 
were in turn identified and grouped into sentence structures. The 
algorithm for cutting the phrase sequences called upon rules contained 
in the card deck "Clause Rules" to determine which of the phrase sequen 


ce elements delimited different clauses of the sentence. 
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The cutting algorithm had to be implemented only once for each 
Series of identical seqences, the resulting clauses being replicated 
for the appropriate number of sequences. Hence it was economical to 
first sort "Textual Sequences" in order of the seqiences alphabetic 


value onto a new tape “Alphabetic Sequences". 


Each clause structure was assigned a category number depending 
on its first element. The sequence of clause category numbers of a 
sentence gave the sentence structure, which was assigned a category 
number depending on the number of clauses it contained. The input 
information with the phrase sequences was transferred to the clause 
and sentence structure output records. For the clauses, the order 
number was given an increment of O.ls for the sentences, 0.5. The 


output records were put on the tape "Clauses and Sentences". 
Program P 6: Structure Concordance 


For the input, the tapes "Phrase Structures" and Clauses and 
Sentences" were sorted according to category number, alphabetic value 
of the token, and order number onto the tape "Categorized Structures", 
Each different clause and sentence structure was assigned an identity 


number, and put onto the tape "Concorded Structures", 


The first token of each structure was put onto the tape "Structure 
Types". Included in each output record was the information of the 


first token plus the number of tokens of the structure typee A printed 
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list of the output tape gave us the structure types in order of cate- 
gory and alphabetic value of the structure. The information displayed 
included the category and identity numbers, the page and line numbers 


of the first occurrence, the order number and the token cout. 
5el0e~ Program P 7: Structure Counts 


The structures of "Structure Types" were sorted in textual order 
onto the tape "Textual Structures." They were counted in this order 
for the number of types and tokens within each category for each section 


of text of 500 — word occurrences. 


Each new structure type was counted according to the presentation 
number of its first occurrence. This introduction count was done for 


the whole manual. 


The number of new phrase, clause and sentence structure types was 
counted by sentence. Each sentence was then counted according to its. 
number of new vocabulary words, functional words, grammatical endings, 
and structures. These density comts were accumulated for the whole 


method. 


At each 500 = word occurrence blocking point, the corresponding 
record of "Identity Counts" was read in. The type-token structure 
counts and the productivity counts were combined with the input counts 


for the words and grammatical endings for the output records of "Counts". 
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The productivity of the structures was calculated in terms of 
their immediate constituents. The logarithm of the type counts for 
each element of each structure was accumulated for the phrases, for 


the clauses, and for the sentencese 


The introduction and density counts were included in the final 


record of "Counts", 
Program R Is Block Results 


The data contained in "Counts" was manipulated to give part of 
the measurements defined in chapter II. The results were printed out 
in tables. The headings and other worded indications in the table were 


read in from the prepared card deck "Block Titles", 


Included in the results for each block were the intake and pro~ 
ductivity of the gradation (ref. 2.2.) « For the method as a whole, 
results were produced for the quantity, proportion, and utility of the 
selection (ref. 2.1.), the density of the gradation (ref. 1M Pee ey 
the introduction of the presentation (ref. 2.3.le), and the repetition 


by category (ref. 2ehele). 
Program R 2 (a & b) : Sigla Analysis 


The list of sigla printed out after the word separation (P1) was 


verified manually.e Any errors or modifications were taken up in the 


card deck "Text Sigla", The corrected deck served as data for tabulations 
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of presentation and repetitions The number of word occurrences govermed 
by each sigla was given by the difference between order numbers between 
Siglae The occurrences were accumulated according to each of the coded 
characters in the sigla. The totals were stored on the tape "Sigla 


Tables" ° 


The final results of the sigla analysis were organized and printed 
out in tables by the program R 2 (b). The card deck "Sigla Titles" con- 
trolled the organization of the tables and supplied the headings as 


welle 


The results included the contextualization of the presentation 
(refe 2.3.26), and the repetition by media, shill, and type of exercise 
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6e> The Input Data 


A great deal of linguistic data had to be prepared for use in the 
analysis. The data was formulated for compatibility with the previous~ 
ly outlined analytic procedure. An effort was made, however, to keep 
the two distinct so that both the data and the procedure could be modi~ 
fied independently. Because of the original nature of the project, 
much of the work was tentative. We anticipated the lengthening of the 
data lists and the inclusion of additional data. Ail but the most 


voluminous data was kept on cards where it could be readily changed. 


A number of output procedures and formats were governed by card 
data. This allowed us to print the results in any chosen order 


with English or French headings. 


The coordination between programs, input data, and output results 


is demonstrated in Figure l. 


6.1." Vocabulary Words 


This list contains about 800 vocabulary words from among those 
most frequently used in French. The data for each entry consisted of 


numbers characterizing the following entities: 


1e- the grammatical category, 
2e- the word root identity, 

3.> the functional ambiguity, 
he= the root length, 

Se- the frequency (ref. 2ele3e)s 


6o= the range (ref Os letee 
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Because of its length and the stability of its information, the 


list was put on magnetic tape. 


The tape records were sorted in alphabetical order in preparation 


for use by the program P 2. 


Basically, only one form of each word was included in the list. 
For certain words, however, the inflected forms corresponded to alpha=- 
betical positions differing widely from the listed form. To simplify 
the search procedures of program P 2, extra forms were included,;. espe 
cially for the verbs, where the separation exceeded three list posi~ 


tions 


Grammatical Endings 


About 80 of the most regular endings were stored on cards. Hach 
was asSigned numbers to characterize its identity and its grammatical 


categorye They were listed in alphabetical order by categorye 
Functional Words 


This list contains the functional words, the punctuation marks, 
and any vocabulary words not in "Vocabulary Words" but foreseen as 


needed (esege proper nouns). 


The words were put punched cards with identity and category 


numberse The prepared deck was in alphabetic order. 
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There are several reasons why this list was kept separate from 
"Vocabulary Words": 

1 ~ A simpler and faster search procedure is possible for 
elements with no flectional variants. 

2 = The number of elements is relatively limited: about 
300 functional word forms and about 15 combinations 
of punctuation marks. 

3 ~ The classification of the functional words presents 
special problems. The categories needed for the analysis 
depend largely on the analysis itself. Functional ambi~ 
guities are numerouSe 

= The combinations of basic punctuation marks encomtered 
varies from one method to another. Their utilization, 


hence their grammatical function, varies as well. 


5 =~ Extra words can be added at will. 
6ele= Ambiguity Rules 


For each word type characterized by an ambiguity number corres 
ponds a sequence of tests and operations encoded on the punched cards. 
The rules define the correspondance between the textual environment of 
an ambiguous element and the changes to be made in the identification 
of the elements. For each ambiguity number there is a card containing 
one rule. Each rule may call upon another, this building up sequences 
of tests that can explore beyond the elements immediately before and 


after the one in questione 


Each rule contains up to six numbers which occupy specified posi~ 
tions in the card: 
1 - The ambiguity number characterizes the rule and specifies 


whether the operation applies to the element in question, 
the preceding element, or the following element. 
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2 = The operator number specifies the operation: 


a) test for equality of the operator number with the 
tested elements category number, 


b) test for equality of the operator number with the 
tested element's identity number, 


c) eliminate the tested element, 
ad) replace the tested element, 
e) insert an element after the tested element. 
3 = Three numbers (3rd, lth, 5th) give the new category, identity, 
and ambiguity numbers to be assigned by the operation, if appro- 


priate. The two test operations (a & b) call for these numbers 
only if the test is positive. 


l = The sixth number is the ambiguity number of the next rule to 
be invoked. It does not apply after a positive teste 
About 100 rules were included for the first analysis. They were 


ordered by increasing ambiguity number. 
The Phrase Dictionary 


About 250 phrase structures that could be encountered in the gram= 
matical hierarchy were prepared in advance. To each was assigned an 
identity number. The phrases were grouped into categories according 
to their function in the clause and given an appropriate category number. 
They were then put on cards and ordered by increasing alphabetical 


value of the structure. 
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6e7o~ The Clause Rules 


The rules for dividing a sentence containing two or more conju= 
gated verbs into its constituent clauses were composed in terms of 
certain critical clause elements. In the program P 5 the phrase se~ 
quence elements corresponding to possible conjunctive elements are 
arranged in a skeletal sequencee The list of clause rules contains 
about 50 possible skeletal sequences, representing possible phrase 
sequences up to the second verb. Four numbers delimit a clause 
within the phrase sequence, thus permitting the abstraction of the 
clause from the phrase sequence. A fifth number gives the category 
of the clausee The procedure may be iterated for sentences with more 


than two clauses. 


Two of the numbers mentioned give, respectively, the two limiting 
elements of the clause within the skeletal sequence. The other two num~ 
bers determine whether or not the limiting elements are included or 


excluded from the part of the phrase sequence making up the clause. 
6.06" Word Corrections 


One card was prepared for each element type to be modified in 


P 3. The cards contained the following information: 


1 - the uncorrected spelling of the element, 


2 = the corrected spelling of the word, where appropriate, 
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3 = the corrected identity, category, and ambiguity numbers 
where appropriate, 


4 - an operator number to determine one of three operations 
on the elements to be corrected: 


a) modification of the element as indicated, 
b) elimination of the element, 


c) insertion of the indicated number of extra elements. 


The operation specified on the word correction card was carried 
out for each of the corresponding elements of the uncorrected tape 


“Alphabetic Words", 


The word corrections were sorted alphabetically to correspond 
to the order of “Alphabetic Words". For each correction calling for 
insertions, the elements to be inserted were included, one per card, 


immediately after the correction card. 
Block Titles 


The headings chosen for the output result tables were put onto 


cards in the order and position needed for the output. 


Corrected Sigla 


Each verified sigla card contained a siglum as defined (ref. ).1. | 
and Table 1), and the order number of the word preceding it in the text. 


The sigla cards were sorted in textual order. 


Sigla Titles 


This card deck was prepared to control the output result tables of 
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program R 2 (b). Each card contained four possible kinds of informa- 
tions 
1 =- a number indicating the sub-series of program instructions 
appropriate for the calculation of the desired measurement, 


2 - numbers indicating the data of "Sigla Tables" entering into 
the calculations, 


3 - the heading desired for the line of results to be printed, 


lh — a number controlling the line and page spacing of the printed 
resultse 
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Problems Encountered During Trials 


The discussion of the analysis and of the input data assumes the worka— 
bility of the system. In fact, it treats the system in its ultimate state, 
after its trial end succeeding modification. A number of problems arising 


during the trials retarded the realization of the project. 
Programming Problems. 


The programs were first prepared for use with a machine memory of 

40,000 character positions. Practically, this relatively low limit meant 
breaking up the logical sequence of machine operations into an awkward series 
of programs. That is to say, it took more than. the ideal number of passes on 
the machine to handle the input material. It meant as well that the linguis- 
tic data had to be organized into dense tables and entered in small portions 
into the memory. As a result the first trial production was extremely uneco- 
nomical; it required two and a half times the machine time required by later 


productions. 


The problem of machine memory was solved by the addition of another 
memory unit of 20,000 character positions. To capitalize on the increased 
resources of the computer, the programs had to be re-organized. Parts of 
the programs were rewritten, others omitted. The transformed programs had 


to be retested, and a second trial run produced. 
Program Testing 


The functioning of each program was verified in a series of tests on 
the computer. After each testing run, the results were examined for errors 


due to faulty instructions. The program was subsequently modified then 
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retested. This "debugging", as it is called, is standard procedure. 


This stage in the preparation of the programs proved far more lengthy 
than anticipated. First of all, the complexity of the instructions neces- 
sary for non-numeric data handling served to increase the incidence of error 
and the difficulty of correction. For instance a faulty instruction may 
cause the obliteration of another instruction well removed from it; when 
invoked, the missing instruction will cause a program halt, but will not 
reveal the source of the error. Only by lengthy and tedious examination of 
the program instructions and of the test results could certain errors be 
traced. Had the programs been simpler, a short inspection would have suf~ 


ficed. 


The sharing of machine time imposed serious restrictions on the rhythm 
of program testing. A minimum lapse of one day could be expected between 
the submission of a program for testing and the reception of the test re- 
sults. Counting correction time, a series of ten or twelve tests could 
amount to a month for a single program. As much as possible, the programs 
were tested concurrently, but where the output of a pregram was needed as 


input to the succeeding one, the latter could only be tested consecutively. 


During the testing of the programs, the conditions of machine utiliza-— 
tion were never near the level achieved during the period of production. 
First, there was only one operator available to run the computer. During 
periods of heavy demand for machine time, it could take up to a week for a 
job to be treated. When inevitable operator absences or machine malfunc-— 
tioning caused the loss of a day, it could be several extra days before 


the accumulated work was donee 
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The programs that were modified when the extra memory unit arrived, 
had to be retested. Inevitably, the new unit temporarily impeded normal 
machine usage while its own defects were being tested and eliminated. 

A month after the installation, several of the programs could still not 


be used due to errors generated by the new equipment. 
The Trial Run 


With the programs finally working, the system had to be subjected 
to a test case in order to judge the adequacy of the input linguistic 
data and the appropriateness of the program logic. The trial run exposed 
several problems that had to be solved before serious production could 


starte 


On the basis of the trial results, the procedure for phrase and 
clause identification was modified to decrease production time and im- 
prove the results. It also turned out that the input data for phrase 
identification was not sufficiently extensive to handle the phrases en- 


countered. The data file was enlarged upon accordingly. 


Some of the measurements to be tabulated in the results were tenta- 
tive. In particular, the figures for the productivity (refs 2.2455) 
moved us to replace its direct measurement with the logarithmie function 
of its value. Other minor changes in the measurements and their repre- 


sentation were incorporated into the programs. 
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a) 
Interpretation of the Results 


The immediate results as put out by the computer may be on punched 
cards or magnetic tape rather than on printed pages. They are transferred 
to paper wherever readable lists sre desired. Hence all the results are 


treated as if in readable form. 
The Working Lists 


After each step of the analysis any or all of the output data may be 
considered as intermediate results. As indicated in Figure 1, the working 
lists "Word Corrections", "Identified Words", and "Textual Sigla" are indis- 
pensable for the correction of the data at crucial points in the analysis. 


Their use is discussed in Chapter 5. 


A full concordance of the words, punctuation, word endings and struc- 
tures of each method exists in the lists "Concorded Identities" and "Con~ 
corded Structures". These two lists are useful for detailed examination 
of the method at the level of its grammatical elements, but are extremely 


voluminous. 
The Type Lists 


The lists "Identity Types" and "Structure Types" summarize the concor- 
dances of the words, punctuation, word endings and structures. Each graphi- 
cally different element is listed with its identifying information and indi- 
vidual token count for the whole method. A copy of the lists is included in 


(1) 
Appendix I with a key to the information contained in them. 


(1) Appendix 1 consists of machine print-out; hence, it exists only 
one copye — 
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The Block Results 


The intake and the productivity were calculated at 500-word inter- 
vals throughout the text. The results are tabulated in "Textual Results". 


A sample list is presented in Appendix I . 
The Productivity 


The productivity degree is the logarithm to the base 10 of the 
productivity. In Table 3 the total degree is the sum of the producti- 
vity degrees of the structures, and the average degree is the arithme- 


tic mean of the total. 


For example, the value 5.1 for the average degree of the phrases 
in Table 6e— (a) may be interpreted as meaning a combinatorial produc— 
tivity of about 100,000 (ago t) for each phrase, or of 17,500,000 


(176 X j@5.1) for all the phrases of the method. 


The Intake 


The intake was calculated by dividing the number of new types within 
a given category by the number of its tokens in the 500-word block. Thus 
it is not strictly proportional to the amount of new material. High va- 
lues may be due to a large number of new types or a low number of repeti- 
tions. The ambivalence was eliminated from the graphic representation 


(ref. Be5e)e 
The Global Results 


All of the measurements, including the intake and the productivity, 
were tabulated for the method globally in "Sigla Results" and "Textual 


Results". They appear in Tables 3 to 11 for the two methods analysed. 
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A number of the measurements depend on the analysis of the coded 
sigla. The process of pre-editing proved to be somewhat unsatisfactory. 
The diversity of the manual operations demanded at the pre-editing stage 
complicated the task for the recent initiates doing ite The multipli- 
city of possible sigla combinations contributed to the difficulty. Be- 
cause of the tentative nature of the first analyses, the pre-editing 
was verified only once. The obvious coding errors were taken up at the 


sigla correction stage, the rest remained. 


In Table 8.—- (b) a systematic error of interpretation caused mest 
of the verbal contextualization to be counted as "list" rather than as 


"prose" or "dialog". 


In Tables 8, 9, and 11 the results must be considered approximative. 


The multiple repetition in Table 10 is approximative as well. 
The Graphic Results 


Figures 2 to 10 are graphic representations of the results in Tables 
3 to 11 and of the block results (ref. Se3e)e The following indications 
eliminate possible ambiguities of interpretation. 


1. In Figure 3, the values of the range were multiplied 
by a factor of 3 to enhance their presentation. 


2e The number of new types rather than the type-token 
ratio was chosen to represent the intake in Figure 
4. This simplifies the interpretation of the graph: 
where high, there are many new types; where low, few. 


3@ In Pigure 4, the new types were counted within distinct, 
consecutive, 1000-word segments of running texte 


4, The height of the columns in Figure 6 represents the 
proportion of the sentences presenting one or more new 
elements. 
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5. Being based on the approximate results of the corresponding 
tables, the Figures 7; 8 and 10 are approximative.e The 
multiple repetition of Figure 9 is as well. 


6~ The multiple repetition of Figure 9 represents the extra 
repetition due to textual material duplicated in media other 
than the ones punched onto cards. 
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VOIX ET IMAGES DE FRANCE 


CATEGORY 


NOUNS 

VERBS 

ADJECTIVES 

ADTERBS 

TOTAL, VOCABULARY 
TOTAL, GRAMMAR WORDS 
TOTAL, WORD ENDINGS 
PHRASES 

CLAUSES 

SENTENCES 

TOTAL, STRUCTURES 


TOTAL, GLOBAL 
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TABLE 30> (a) 
SELECTION, QUANTITY 


TYPES 
( NUMBER) 


81h 
273 . 
17 
Bh 
125) 
183 
57 
176 
161 
38 
1855 
339 


‘WIAMWAADOV .JATOT 
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BOMEANS (AGW .TATOT 
| Cae AMT 
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Tabletcy—(B) 


FRENCH THROUGH PICTURES SELECTION, QUANTITY 
CATEGORY TYPES 
(NUMBER ) 
NOUNS 445 
VERBS 99 
ADJECTIVES 107 
ADVERBB 3 
TOTAL, VOCABULARY 654 
TOTAL, GRAMMAR WORDS 140 
TOTAL, WORD ENDINGS 46 
PHRASES 138 
CLAUSES 1070 
SENTENCES 34 
TOTAL, STRUCTURES 1242 


TOTAL, GLOBAL 2082 
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TABLE en (a) 
VOIX ET IMAGES DE FRANCE SELECTION, PROPORTION AND UTILITY 
CATEGORY TYPES PROPORTION FREQUEN CY RANGE 
(NUMBER) ( PERCENT ) ( AVERAGE) (AVERAGE) 
NOUNS 81h, 6hie9 QheT ran 
VERBS 212 pA ers!| 108.6 alee 
ADJECTIVES 147 prey | Oo? 1561 
ADVERBS ity, 1.3 1502 S26k 


TOTAL, VOCABULARY 1254, 100.0 45.5 1340 
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TABLE 4.- (b) 


FRENCH THROUGH PICTURES SELECTION, PROPORTION AND UTILITY 
CATEGORY TYPES PROPORTION FREQUENCY RANGE 
( NUMBER ) (PERCENT ) ( AVERAGE ) ( AVERAGE ) 
NOUNS 445 68,0 355.0 12.8 
VERBS oe) bcpie Cao ee Fela 
ADJECTIVES 107 Le ota) 44.8 1356 
ADVERBS 3 04 44.0 Fede EAE 


TOTAL, VOCABULARY 654 100.0 6f.9 16.2 
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TABLE 5. (a) 


VOIX ET IMAGES DE FRANCE 


CATEGORY 


NOUNS 


VERBS 


ADJECTIVES 


ADVERBS 


TOTAL, VOCABULARY 


TOTAL, GRAMMAR WORDS 


TOTAL, WORD ENDINGS 


PHRASES 


CLAUSES 


SEN TEN CES 


TOTAL, STRUCTURES 


TOTAL, GLOBAL 


(NUMBER) 
814 
273 
147 

af 
125), 
183 
57 
176 
1641 
38 


1855 


339 


TOKENS 
(NUMBER ) 
7h15 
069 
LS 
129 
12932 
21183 
3720 
22072 
6288 
5457 
33817 


71652 
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INTAKE, GLOBAL 


INTAKE 
( PERCENT ) 
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FRENCH THROUGH PICTURES 


CATEGORY 


NOUNS 

VERBS 

ADJECTIVES 
ADVERBS 

TOTAL, VOCABULARY 
TOTAL , GRAMMAR WORDS 
TOTAL, ENDINGS 
PHRASES 

CLAUSES 

SEN TEN CES 

TOTAL, STRUCTURES 


TOTAL, GLOBAL 


TABLE 5.= (b) 


TYPES 
(N UMBER ) 


Lh5 
es 


TOKEN S 
(NUMBER) 
6719 
1926 
169 

25 


INTAKE, GLOBAL 


INTAKE 
( PERCENT ) 
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VOIX S&T IMAGES DE FRANCE 


CATEGORY 


PHRASES 


CLAUSES 


SENTENCES 


TOTAL; STRUCTURES 


whan 


TABLE 6.> (a) 


PRODUCTIVITY, GI.OBAL 


ELEMENTS TOTAL DEGREE AVERAGE DEGREE 


L76 913 Sel 
L641 20980 12,7 
38 656 l7e2 
1855 22549 12.1 
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TABLE 6.= (b) 
FRENCH THROUGH PICTURES PRODUCTIVITY, GLOBAL 
CATEGORY TYPES TOTAL INDEX AVERAGE INDEX 
PHRASES 138 6306 Led 
CLAUSES 1070 140h7. ial 
SENTEN CES 3h i796 pares 


TOTAL, STRUCTURES 12he 15157. 1232 


@aAVTOTS HOUOMHT HOWE 


YAOORTAD 


(rie Jeet? 


A oror SHeUATO 
Lelt ry de emo Weruee 


eer omg: ks) Be BMAMUTOUNTE .AATOT 


VOIX £T IMAGES DE FRANCE 


CATEGORY 


TOTAL, VOCABULARY 


TOTAL, GRAMMAR WORDS 


TOTAL, WORD ENDINGS 


TOTAL, STRUCTURES 


TOTAL, GLOBAL 


TABLE 7e™ (a) 


DEGREE 0 


(PERCENT OF ALL SENTENCES) 
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FRENCH THROUGH PICTURES 


CATEGORY 


TOTAL, VOCABULARY 


TOTAL, GRAMMAR WORDS 


TOTAL, WORD ENDINGS 


TOTAL, STRUCTURES 


TOTAL, GLOBAL 


TABLE gate) 


DEGREE 


Ba 
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TABLE 8.= (a) 


VOIX ET $/IMAGES DE FRANCE PRESEN TAT ION 


PICTURE CONTEXT NUMBER PERCENT PER THOUSAND 
OF UNITS OF TOTAL TEXT WORDS 


TOTAL PICTURES 219), 99 69 630 
WITH LEGEND 219) 9909 63.0 
WITHOUT LEGEND fe) 20 20 
FOR DISTRIBUTION fe) 20 Ae) 
IN MANUALS ) 0 20 
FOR DISPLAY 2099 9506 60.2 
FOR FIXED PROJECTION 2099 956 60.2 
AS MINUTES OF FILM fe) ‘0 20 


DIFFERENTIAL CONTEXT 


TRAN SLAT ION 0 Pe) 00 
EXPLANATION 0 20 00 
IN STRUCT IONS 0) 20 00 


VERBAL CONTEXT 


TOTAL TEXT WORDS 3820 100.0 
DIALOG 22998 6661 
PROSE 3862 Rig 
SONG AND VERSE fe) 0 


LIST 7930 2207 
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TABLE 8.— (b) 


FRENCH THROUGH PICTURES PRESENTATION 
PICTURE CONTEXT NUMBER PERCENT PER THOUSAND 
OF UNITS OF TOTAL TEXT WORDS 
TOTAL PICTURES 1676 99 9 Cael 
WITH LEGEND 171 69.8 209 
WITHOUT LEGEND 505 Bost 18.5 
FOR DISTRIBUTION 0 oO 3 
IN MANUALS 668 39.8 2h. 
FOR DISPLAY 1862 171.0 6802 
FOR FIXED PROJECTION 100 5969 2617 
AS MINUTES OF FILM fe) AA) Ae) 


DIFFERENTIAL OONTEXT 


TRAN SLAT ION 9h 99 9 3 oh 
EXPLANATION 9h 99,9 34h 
INSTRUCTIONS 0 Ae) a0 


VERBAL CONTEXT 


TOTAL TEXT WORDS 27289 101.4 
DIALOG ve Ae) 
PROSE 975 336 
SONG AND VERSE 25 Ae) 


LIST 25878 9652 
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TABLE 9o> (a) 
VOIX ET IMAGES DE FRANCE PRESENTATION, INTRODUCTION 
CATEGORY PRESENTATION REPETITION NON=SYNTACTIC 
( PERCENT OF ALL NEW TYPES) 
TOTAL, VOCAE ULARY 58.2 39 hi Paes 
TOTAL, GRAMMAR WORDS Tas3 OVE 3.8 
TOTAL, WORD ENDINGS 59.6 38.5 147 
TOTAL, STRUCTURES 42.6 5703 oO 
TOTAL, GLOBAL 50.4 48. Lgl 
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FRENCH THROUGH PICTURES 


CATEGORY 


TOTAL, VOCABULARY 


TOTAL, GRAMMAR WORDS 


TOTAL, WORD ENDINGS 


TOTAL, STRUCTURES 


TOTAL, GLOBAL 


TABLE 9.- (b) 


PRESENTATION REPETITION 


elas 


PRESENTATION, INTRODUCTION 


NON-SYNTACTIC 


(PERCENT OF ALL NEW TYPES) 


60.5 


62.8 


47.1 
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VOIX ET IMAGES DE FRANCE 


CATEGORY 


NOUNS 


VERBS 


ADJECTIVES 


ADVERBS 


TOTAL, VOCABULARY 


TOTAL, GRAMMAR WORDS 


TOTAL, WORD ENDINGS 


PHRASES 


CLAUSES 


SENTENCES 


TULA, STRUCTURES 


TOTAL, GLOBAL 


TOPS 


(NUMBER) 


814 


aid 


147 


Ly 


1254 


1G%3 


ay 


176 


1641 


368 


1855 


3349 


TABLE 10.- (a) 


SIMPLE 
TOKENS 
(NUMBER ) 


7415 


4069 


129 


12932 


ZANGS 


3720 


22072 


6288 
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TABLE 20. = -(b) 


FRENCH THROUGH PICTURES REPETITION BY CATEGORY 
CATEGORY TYPES SIMPLE SIMPLE MULTIPLE MULTIPLE 
TOKENS REP. TOKENS REP. 

(NUMBER) (NUMBER)  (AVE.) (NUMBER) (AVE.) 
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TABLE 11.— (a) 


VOIX ET IMAGES DE FRANCE REPETITION, DISTRIBUTION 
ACCORDING TO SKILL NUMBER PERCENT PER HUNDRED 
OF TOKENS OF TOTAL TEXT WORDS 
TOTAL WORDS OF METHOD },8112 999 138.1 
LISTEN ING 21591 hh.8 62.0 
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TABLE 11.- (b) 


FRENCH THROUGH PICTURES REPETITION, DISTRIBUTION 
ACCORDING TO SKIEL NUMBER PERCENT PER HUNDRED 
OF TOKENS OF TOTAL TEXT WORDS 
TOTAL WORDS OF METHOD 81h 999 17664 
LISTEN ING 20314 2.1 The 
READING 26951 5569 9867 
SPEAKING 28296 5867 1036 
WRITING 1432 29.9 52.8 


ACCORDING TO MEDIUM 
PRINTED 26951 5569 98.7 
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ACCORDING TO VARIETY 
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FIGURE 5e= (a) PRODUCTIVITY OF THE STRUCTURES 
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FIGURE 6.- (a) DENSITY BY SENTENCE 
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Conclusions 

The evaluation of the methods in terms of their pedagogical fac- 
tors is beyond the immediate scope of this project. We can, however, 
assess the numerical results as to their appropriateness, and the 
system of analysis as to its value. 
Definition of the Measurements 


The intake as defined is too c@mplex a variable to be readily 
meaningful. Redefining it as the ratio of new types to running words 
within a segment of text has the following advantages: 

1. the order of magnitude of the value is independent of the 

arbitrarily chosen length of segment, 


2e the values are independent of the repetition, or number cf 
tokens corresponding to the types, 


3 the results are more easily interpreted in terms of gradations 
the higher the value, the steeper the gradation. 


The total productivity is another compiex variable. It depends 
on the number of structures, the lengths of the structures, and the 
number of types of each element of the structures. Averaging elimi- 


nates the effect of the number of structures. 


The non-logarithmic productivity is so heavily weighted by the 
most productive structures that the results lose significance for the 
structures as a whole. A logarithmic function reduces the relative 
importance of larger productivity values, making the average produc= 
tivity more representative of all the structures. By choosing a loga- 
rithmic base other than ten, discrimination between high and low pro- 


ductivities may be varied. 
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9020 Proposed Measurements 


The system of analysis could be adapted to produce additional 


results with only minimal modification. 


Je2ele- Phonetic Description 


The selection, gradation and repetition of the phonetic elements 
of the words is feasible. To the data files "Vocabulary Words" and 
"Functional Words" would have to be added the coded pronunciation of 
each word. With a few additional instructions the existing programs 


would suffice. 


Qe2e2em Utility Parameters 


Other criteria such as availability and semantic coverage could 
be used to assess the vocabulary. The appropriate values would be 


assigned to the entries of "Vocabulary List". 


Go2e5e= Repetition Profile 


In addition to the amount of repetition of an element it would 
be pessible to assess the distribution ef its repetition throughout 
the text. The profile could be expressed by a vector of several va~ 
lues, each representing the repetition within a certain proximity of 
the first occurrence. The profiles could be averaged for the whele 


texte. 
Go2ce40= Detailed Results 


The results presented by grammatical category could be augmented 


to include further subdivisions of the grammar words, word endings and 
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structures. 


The Presentation of the Results 


A number of modifications in the format and grouping of the 


results would add to their usefulness. 


Word Type List 


The list "Word Types" could be modified to include only one 
inflectional form for each word. The word's repetition count would 
then be more significant. The list would be shorter, hence easier 


to UuSe.e 


Type Evaluation 


The list "Word Types" could easily include the values of utility 
for each vocabulary word. Similarly, the list "Structure Types" could 
include the productivity. This would enhance their usefuiness for 


detailed examination of the selection. 


Averages and Dispersion 


We made frequent use of arithmetic means to summarize long lists 
of figures. These averages would be more useful if complemented by 


some measure of the dispersion of the values about the mean. 


The System of Analysis 


Any modification of the system of analysis that minimizes manual 
intervention, saves production time or improves the output results 


ig an improvement. A certain number are envisagede 
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Gotele= Pre—Editing 


Ge4e2c0e 


The identification of the pedagogical units of text (ref. 4.1) 
with coded sigla should not be included in the pre-editing stage. 
The sigla should not be included in the text, but in an indepen-— 
dent card deck. The analyst could select prepared sigla cards, indi- 
cating on each the order number of its position of the text. The 
results of the sigla analysis would then be collated with the text 
at a late stage in the analysis. This procedure offers several 
advantagese 

1. It eliminates handwritten entries in the original text. 

The key punch operators work with a simpler, cleaner text, 
hence, make fewer errors. 

2. The proposed procedure is flexible. The sigla need not 

be coded before the punching up. Sigla errors are not 
irretrievably included in the textual data at "P? as is 
presently the case. 

3o It is more economical. Clerical help can perform the sin- 

plified pre-editing. Coding and punching are speeded upe 


The text need not be scanned during the first machine pass 
to extract the sigla. 


The Word Lists 


An extension of the list "Vocabulary Words" would reduce the 
manual intervention and improve the measurement of the utility (ref. 
304.358) The number of words to be identified during the manual cor- 
rection (Fig. 1.- (a) ) would be reduced. The correction time could 
easily be cut in half by doubling the length of the data list. With 
the longer list, the utility parameters would be applied to a larger 


proportion of the words. 
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The Elimination of Functional Ambiguities 


The use of a special grammar to complete the identification of 
homographic elements is awkward and time consuming. Each word 
must be tested for an ambiguity indicator, and, if positive, for a 
series of possible context conditions. To be completely effective, 
the list "Ambiguity Rules" would have to be much longer then at 


presento 


A possible solution would be to include the neighbouring ele- 
ments with each word listed as being ambiguous. During word cor- 
rection, the assigned identification can be verified and adjusted 
in terms of each context encountered. This procedure could easily 


include the reduction of idioms to the word level. 
The Grammatical Analysis 


The present grammatical analysis is sufficient, but not ideal. 
As employed, the three levels of structuration, phrase, clause and 
sentence, do not give a compact description of the grammar. That 
is, there are too many different structural types at the clause le- 
vel. For instance, of 1855 structure types in the example of Table 


3e= (a), 1641 are clause structures. 


One solution is to redefine the phrase structures to include 
sequences of two or more dependent phrases. A series of prepositio-— 
nal phrases, for instance, would constitute one phrase. This would 
increase the number of phrase types and decrease the number of clau- 


se types. For the chosen example, a total of 950 structures, with 
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about 600 phrase structures and 300 clause structures would render 


the results more tractable. 


A second solution is the inclusion of a phrase-clause interlevel. 
This fourth level would include the grouped phrases relegated to the 
phrase level in the above solution. The total for the example might 
be about 500 structures: 40 sentence structures, 250 clause structu- 
res, 60 groups of phrases, and 150 phrase structures. It shoulda be 
possible to condense the description even further, increasing the 


number of phrase groups at the expense of the number of clauses. 
Economie Considerations 


The practical feasibility of analysing methods depends on deve~ 
lopment of the system can be broken down into the analysis of the 
problem, the preparation of the data, and the programming of the 
machinee The cost in human terms cannot be broken down, as these 
three aspects were intermingled. Globally, two person-years were 
applied to the development of the system. More time is needed to 


perfect it. 


Production costs vary from one method to another, depending 
on length and complexity. A rough break-down is given below in terms 


of the minimal cost and a cost-per=word supplement. 


1. pre-editing: .07 cents/word 
2. card punching: 1625 cents/word 
3. machine time:$75 plus 1.00 cents/word 


4, coordination and correction: $20.00 
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5e preparation of graphs: $20.00 


The total amounts to $115 basic cost, plus 2.32 cents per word. 
For example, a method with 30,000 words of text would cost about 


$8006 


The cost of machine time depends on the installation. It is 
hoped that machine cost will be reduced at least by half with the 


impending installation of a more advanced machine. 


aah tt ciohninninseid sndeechnithtainiin Sallie Ag - 
mith NA Frans od LLtw tage sailonm tate boqort 
“sSaisioam boonevhs: orton 8 te aeitslistent gabbaogmt 


he 
aoe 


ARTS 


